Personalized content delivery using peer-to-peer precaching

ABSTRACT

A method and apparatus for peer-to-peer video precaching is described. In one embodiment, the method comprises building a user profile, periodically checking for new content of objects in the user profile, maintaining a data base of available objects and location of said objects, and determining the download location of an object requested by a user.

RELATED APPLICATIONS

This application is a continuation of and claims priority to, and thebenefit of, U.S. Non-provisional application Ser. No. 11/737,425,entitled “Personalized Content Delivery Using Peer-To-Peer Precaching”,filed on Apr. 19, 2007, which claims priority to, and the benefit of,U.S. Continuation application Ser. No. 09/660,991, entitled“Personalized Content Delivery Using Peer-To-Peer Precaching”, filed onSep. 13, 2000, which claims priority to, and the benefit of, U.S.Continuation-In-Part application Ser. No. 09/566,068, entitled“Intelligent Content Precaching”, filed on May 5, 2000, and all of whichare incorporated herein by reference in their entirety for all purposes.

FIELD OF THE INVENTION

The present invention relates to the field of content precaching in anetworked environment; more particularly, the present invention relatesto peer-to-peer precaching content, including bandwidth intensivecontent.

BACKGROUND OF THE INVENTION

The World Wide Web (“web”) uses the client-server model to communicateinformation between clients and servers. Web servers are coupled to theInternet and respond to document requests from web clients. Web clients(e.g., web “browsers”) are programs that allow a user to simply accessweb documents located on web servers.

An example of a client-server system interconnected through the Internetmay include a remote server system interconnected through the Internetto a client system. The client system may include conventionalcomponents such as a processor, a memory (e.g., RAM), a bus whichcoupled the processor and memory, a mass storage device (e.g., amagnetic hard disk or an optical storage disk) coupled to the processorand memory through an I/O controller and a network interface, such as aconventional modem. The server system also may include conventionalcomponents such as a processor, memory (e.g., RAM), a bus which coupledthe processor and memory, a mass storage device (e.g., a magnetic oroptical disk) coupled to the processor and memory through an I/Ocontroller and a network interface, such as a conventional modem.

To define the addresses of resources on the Internet, Uniform ResourceLocator (URL) system are used. A URL is a descriptor that specificallydefines a type of Internet resource and its location. To access aninitial web document, the user enters the URL for a web document into aweb browser program. Then, a request is made by a client system, such asa router or other network device, and is sent out over the network to aweb server. Thus, the web browser sends a request to the server that hasthe web document using the URL. The web server responds to the requestand sends the desired content back over the network to the requester.For example, the web server responds to the http request by sending therequested object to the client. In many cases, the object is a plaintext (ASCII) document containing text (in ASCII) that is written inHyperText Markup Language (HTML); however, the object may be a videoclip, movie or other bandwidth intensive content.

A problem with the Internet is that it has limited bandwidth resourcesand different points in the Internet may experience network congestion,resulting in poor performance especially for bandwidth-intensiveapplications. The Internet backbone is often painfully slow. Thebandwidth limitation is mainly due to one or more congested linksbetween the web server and the client. Broadband access can help insolving the first mile problem but does not help if the congestionoccurs deeper in the network.

High-quality on-demand video over the Internet has been promised for along time now. Lately, the hype has increased due to the emergingdeployment of broadband access technologies like digital subscriber line(DSL), cable modems, and fixed wireless. These technologies promise tobring full motion, TV quality video to consumers and businesses.Unfortunately, early adopters of the technology quickly discovered thatthey still cannot get video in any reasonable quality over the network.Certainly, broadband access improves the viewing experience—some websites targeted to broadband connected customers provide movies withslightly higher resolution. However, the video remains as jerky andfuzzy as before, synchronization with the audio is poor, and it requiresoften tens of seconds of buffering before starting. Nobody wouldseriously consider this to be an alternative to DVD or analog TV.

Providing video over the Internet is difficult because video requireshuge amounts of bandwidth, even by today's standards. MPEG4-compressedNTSC-quality video, for example, uses an average data rate of 1.2Mbits/s, with peak rates as high as 3 Mbits/s. MPEG2/DVD quality videoconsumes 3.7 Mbits/s on the average, with peaks up to 8 Mbits/s.

Most of today's broadband Internet links, especially those to small tomedium-sized businesses (SMBs) typically provide data rates in the 100 sof Kbits/s up to 2 Mbits/s. Most residences get asynchronous digitalsubscriber line (ADSL) technology, which is typically provisioned atapproximately 1 Mbits/s for downloads from the Internet, and 128 Kbits/sfor uploads. Often access links are shared among multiple users, whichfurther reduces the bandwidth available to an individual.

While these data rates are expected to gradually increase in the longterm, another phenomenon causing bandwidth shortage will remain:overprovisioning. Typically, Internet Service Providers (ISPs)overprovision their broadband links for economic reasons by a factor often. This means that if all their customers would use the servicesimultaneously, every one of those customers would get only 1/10th ofthe bandwidth they signed up for. While this scenario might soundunlikely, it is important to note that bandwidth will degrade duringpeak hours. The problem is better known from cable modems, wherecustomers share a cable segment, but applies to all broadband accesstechnologies.

The network backbone can also be the bottleneck. Especially backbonepeering points are likely to impose low data rates, which slows downend-to-end network speed despite fast last mile technology. Eventechnology advances such as terabit routers, dense wave divisionmultiplexing (DWDM), and faster transmission equipment will not helpsignificantly if, as expected, Internet traffic continues to keepgrowing faster than these advances in technology.

One prior art solution to accommodate the slowness of the Internetbackbone is to move content closer to individuals desiring the content.To that end, content may be cached on the carrier edge and requests forsuch content may be serviced from these caches, instead of the webserver servicing the requests. Distributing content in this manner canrequire large numbers of cache memories being deployed at the carrieredge and each cache memory stores content from a number of sites. When arequest is made for content from a site that has been stored in one (ormore) of the cache memories that is closer (from a network proximityviewpoint) to the requester than the original website, the request issatisfied from the cache. In such a situation, the interactiveexperience for text and images is improved significantly only if contentfrom the site has been stored in the cache and the individual making therequest is close enough to one of the servers supporting such a cache tosatisfy requests with the content stored therein. This is referred to ascarrier edge caching. One provider of such a service is Akamai. Also,such an arrangement for caching content requires that the content ownerand the entity caching the content enter an agreement with respect tothe access for that content so that the content can be stored ahead oftime. Some of the providers of a carrier edge caching service usededicated links (e.g., via satellites) to feed web pages and embeddedobjects to these servers and circumvent the Internet backbone entirely.Providing carrier edge caching for high-resolution video requires aparticularly large number of servers to be deployed, since the number ofclients each server can handle simultaneously is very small.

While carrier edge caching takes the load off the backbone and has thepotential to significantly improve the end user's experience for textand image-based content, there are two major shortcomings with thisapproach. First, it requires hardware infrastructure to be deployed on agiant scale. Without servers in all major ISP's point of presence (POPs)and satellite receivers in central offices (COs), caching on the carrieredge does not work effectively. To deploy and maintain this hardwareinfrastructure is very cost intensive. Second, the last mile access linkremains the bottleneck for affordable truly high resolution video forthe foreseeable future.

Thus, high-quality video-on-demand in the strongest sense of the wordmight be something that will not be available for a while. However,despite all these limitations, a broadband access link of 500 KBits/scan deliver more than 5 GByte of data in 24 hours, which corresponds to8 hours of NTSC quality video, or 3 hours of DVD quality video—more thanmost people, especially at work, ever watch.

SUMMARY OF THE INVENTION

A method and apparatus for peer-to-peer video precaching is described.In one embodiment, the method comprises a client receiving an indicationfrom a controller that at least one new content object corresponding tocontent specified in a user profile is to be downloaded, the clientreceiving an indication of a location of the at least one content objectfrom the controller, and downloading the content object from thelocation. Other features and advantages of the present invention will beapparent from the accompanying drawings and from the detaileddescription that follows below.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will be understood more fully from the detaileddescription given below and from the accompanying drawings of variousembodiments of the invention, which, however, should not be taken tolimit the invention to the specific embodiments, but are for explanationand understanding only.

FIG. 1 illustrates a flow diagram of one embodiment of a process forprecaching.

FIGS. 2, 3, 4, and 5 illustrate one embodiment of a precachingarchitecture.

FIG. 6 is an exemplary protocol to facilitate precaching.

FIG. 7 is a block diagram of one embodiment of a computer system.

DETAILED DESCRIPTION

A method and apparatus for peer-to-peer content precaching is described.In the following description, numerous details are set forth. It will beapparent, however, to one skilled in the art, that the present inventionmay be practiced without these specific details. In other instances,well-known structures and devices are shown in block diagram form,rather than in detail, in order to avoid obscuring the presentinvention.

Some portions of the detailed descriptions that follow are presented interms of algorithms and symbolic representations of operations on databits within a computer memory. These algorithmic descriptions andrepresentations are the means used by those skilled in the dataprocessing arts to most effectively convey the substance of their workto others skilled in the art. An algorithm is here, and generally,conceived to be a self-consistent sequence of steps leading to a desiredresult. The steps are those requiring physical manipulations of physicalquantities. Usually, though not necessarily, these quantities take theform of electrical, magnetic, or optical signals capable of beingstored, transferred, combined, compared, and otherwise manipulated. Ithas proven convenient at times, principally for reasons of common usage,to refer to these signals as bits, values, elements, symbols,characters, terms, numbers, or the like.

It should be borne in mind, however, that all of these and similar termsare to be associated with the appropriate physical quantities and aremerely convenient labels applied to these quantities. Unlessspecifically stated otherwise as apparent from the following discussion,it is appreciated that throughout the description, discussions utilizingterms such as “processing” or “computing” or “calculating” or“determining” or “displaying” or the like, refer to the action andprocesses of a computer system, or similar electronic computing device,that manipulates and transforms data represented as physical(electronic) quantities within the computer system's registers andmemories into other data similarly represented as physical quantitieswithin the computer system memories or registers or other suchinformation storage, transmission or display devices.

The present invention also relates to apparatus for performing theoperations herein. This apparatus may be specially constructed for therequired purposes, or it may comprise a general purpose computerselectively activated or reconfigured by a computer program stored inthe computer. Such a computer program may be stored in a computerreadable storage medium, such as, but is not limited to, any type ofdisk including floppy disks, optical disks, CD-ROMs, andmagnetic-optical disks, read-only memories (ROMs), random accessmemories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any typeof media suitable for storing electronic instructions, and each coupledto a computer system bus.

The algorithms and displays presented herein are not inherently relatedto any particular computer or other apparatus. Various general purposesystems may be used with programs in accordance with the teachingsherein, or it may prove convenient to construct more specializedapparatus to perform the required method steps. The required structurefor a variety of these systems will appear from the description below.In addition, the present invention is not described with reference toany particular programming language. It will be appreciated that avariety of programming languages may be used to implement the teachingsof the invention as described herein.

A machine-readable medium includes any mechanism for storing ortransmitting information in a form readable by a machine (e.g., acomputer). For example, a machine-readable medium includes read onlymemory (“ROM”); random access memory (“RAM”); magnetic disk storagemedia; optical storage media; flash memory devices; electrical, optical,acoustical or other form of propagated signals (e.g., carrier waves,infrared signals, digital signals, etc.); etc.

Overview

The precaching technique described herein involves building a userprofile, subscribing for update notifications for new content (e.g.,objects) based on information in the user profile, downloading the newcontent, and intercepting user's requests for a web server totransparently return the content to the user. In order to do so, acontroller maintains one or more databases of available objects and thelocations of the objects. As new content becomes available, thecontroller searches the database of client profiles to determine the setof clients which will want a copy of the new content. The controllersends a message to each of the clients in the set to instruct them todownload the content. The message contains the location from where anobject may be downloaded to the client making the request. When thecontent has already been downloaded by a peer client, the controller mayindicate to the client making the request that the peer client has thecontent and provides an indication to allow the requesting client todownload the content from the peer client. Thus, in such a case, thereis peer-to-peer precaching.

In an alternative embodiment, the controller checks for new content, inresponse to a request by a client, by searching one or more database(s)to determine if the content object has been already downloaded by aclient in the system.

FIG. 1 is a flow diagram illustrating one embodiment of the contentprecaching process. The process may be performed by processing logicthat comprises hardware, software, or a combination of both.

Referring to FIG. 1, the process begins with processing logic building auser profile (processing block 110). As described in more detail below,a user profile may be built by tracking user access patterns, receivingprofile information from another entity, and/or being configured with aprofile (or portion thereof) from a user.

Processing logic subscribes for an update of new content based oninformation in the user profile (processing block 120). In oneembodiment, the periodic checking includes sending the requests to acontroller (e.g., centralized master), which determines if there are anynew content objects that correspond to content specified in the profile.Whether new content exists may be identified by querying a mastercontroller, subscribing with the master controller, and/or crawling thenetworked environment. Each of these will be described in more detailbelow.

Processing logic receives an indication of the location of new content(processing block 130). Subsequently, processing logic downloads the newcontent (processing block 140) from that location. Then processing logicintercepts a user's request to a web server and transparently returnsthe content to the user from a local storage (e.g., cache) instead ofthe original web server (processing block 150).

The content comprises objects (e.g., content objects) that may includeweb pages, video files, audio files, source code, executable code,programs (e.g., games), archives of one or more of these types ofobjects, databases, etc.

In one embodiment, the clients run on a platform and maintain profiles.A client may be an end point of a network or one or multiple hops awayfrom the end point of a network (e.g., a local area network). Byforwarding its profile to the controller and having the controllerindicate when to download new content objects, the client is able toobtain and precache content objects prior to requests based on profiles.The content objects are stored in a precache memory while the networkaccess link is not used interactively.

When a web browser or other end user program makes a request for acontent object, the client intercepts the request and checks todetermine if it has the content object stored locally. If it is storedlocally, then the client obtains the content and sends the contentobject to the browser or other end-user program using any inter-processcommunication (IPC) mechanism; in doing so, the object may simply betransferred to another task, process, or thread running on the samemachine as the client, or it may travel over a local network (e.g., LAN)to a different machine that is running the browser or end-user program.If the content object is not available locally, then the clientretrieves the object, or a low-quality representation of the object,over the wide area network from any server which hosts the contentobject.

One Embodiment of an Architecture for Content Precaching

FIGS. 2, 3, 4, and 5 illustrate one embodiment of an architecture forthe content precaching described herein. Referring to FIG. 2, one ormore content providers (e.g., web servers, video servers, audio servers,etc.) 202 are coupled to the Internet 206 or another networkedenvironment. One or more clients 203 is coupled directly to Internet206, or indirectly coupled to Internet 206 through a client appliance205.

Clients 203 or 204 may comprise a PC, a work station, a networkappliance device, a web pad, a wireless phone or other communicationdevice, a set-top box, etc. Client appliance 205 may be implemented on aservice gateway (e.g., a router, bridge, switch, other network device)in the LAN or as an end point on the LAN. Clients 203 or clientappliance 205 may run software and reside in a LAN or other networkedenvironment. In one embodiment, the precache memory is part of client203. In another embodiment, the precache memory is part of clientappliance 205, or on another client machine that is linked to client 203by way of a LAN or some other networking subsystem.

Client 203 or client appliance 205 may be coupled to the Internet by amodem link, a digital subscriber line (DSL), cable modem, (fixed)wireless connection, fiber, etc. This coupling may be either a directconnection, or indirectly connected through a router, switch, or othersimilar device.

One or more clients may be peers. A peer is a “nearby” or local host,such as, for example, a host in the same LAN, a host connected to thesame ISP, or any other networked device offering reasonable connectivity(e.g., bandwidth, latency). In FIG. 2, any or all of clients 203 orclient appliances 205 may be in peer relationships with each other.

Master controller 201 is coupled to Internet 206 or some other networkedenvironment. Master controller 201 is a server host or cluster of serverhosts along with storage (e.g., network-attached storage, databaseengines), and typically resides in a controlled environment at one or afew locations strategically placed in the Internet to allow forreasonable connectivity.

Master controller 201 can discover content which becomes available atcontent servers 202. One way in which new content can be discovered isthrough direct reports 210 coming from content sewers 202. Such directreports 210 could be generated periodically, or in response to an eventon server 202 (e.g., new content being placed on the server by theserver administrator). Direct reports 210 are usually generated bysoftware running on servers 202.

Another way in which master controller 201 can discover the availabilityof content on content servers 202 is by use of a server appliance 207that is colocated on the server 202's site, or close to it. Serverappliance 207 can locally crawl (220) through the content on server 202frequently to check for the availability of new content. It can thenreport new changes, or provide a summary of all content on server 202,by sending messages 230 to master controller 201. In this context, theserver 202 need not run any special software that can communicatedirectly with the master controller.

A third way in which master controller 201 can discover the availabilityof content on content servers 202 is by directly crawling (240) thecontent available on content server 202. This crawling operation 240 issimilar to the way in which web search engines retrieve files from anInternet web server.

FIG. 3 is an alternative view of FIG. 2 illustrating the gathering ofprofiles. Referring to FIG. 3, clients 303 maintains profiles for localusers. In one embodiment, the profile is built based on observing useraccess patterns, and from those access patterns, determining what typesof content the end user will want to access in the future. In anotherembodiment, the profile may be built up or augmented by informationprovided directly by the master controller 301 or the end users or both.A local network administrator may also add to the profile.

Profiles for one or more clients 304 may also be maintained by a clientappliance 305. In this case, it would not be necessary for clients 304to run special software to collect and report profiles.

Clients 303 report on the profiles they maintain to master controller301 using messages 310. Similarly, client appliances 305 report on theprofiles they maintain to master controller 301 using messages 320.Messages 310 and 320 can be generated periodically, or in response tosome event (e.g., a request from master controller 301).

FIG. 4 is an alternative view of FIG. 2 illustrating the initiating ofdownloads directly from the server. Referring to FIG. 4, mastercontroller 401 uses its knowledge of what content is available oncontent servers (as described in FIG. 2), and its knowledge of clientprofiles for different clients 403 and 404, to initiate downloads ofcontent that will likely be needed in the future at clients 403 and 404.Master controller 401 sends messages 410 to clients 403 and clientappliances 405 which contain commands to initiate downloads of contentdata from locations 402 specified in the messages 410. Clients 403 thensend a message 420 to the content server 402 from which the content isto be downloaded. Content servers 402 then respond to these downloadrequests 420 by returning the content data 430 to clients 403. Clientappliances 405 retrieve content data from servers 402 in a similarmanner.

FIG. 5 is an alternative view of FIG. 2 illustrating initiatingdownloads from peers. Referring to FIG. 5, master controller 501 usesits knowledge of which clients 503 and client appliances 505 havealready downloaded specific content objects to initiate downloads ofcontent directly from a peer client. In one embodiment, mastercontroller 501 sends a message 510 to client 503.1 to initiate downloadof a content object from peer client 503.2. Client 503.1 sends a message520 to peer client 503.2 to retrieve the specified content. Client 503.2then acts as a content server by responding to request 520 by sendingthe specified content data 530 directly back to client 503.1. In anotherembodiment, master controller 501 can send a command to a client toupload a specified content object to a specified peer client; this isuseful when the client sending the content data cannot be directlycontacted by the requesting client, perhaps because it resides behind afirewall. Client appliances 505 can get content from peer clients 503and/or other client appliances in a similar manner.

In an alternate embodiment of the invention, clients 503 or clientappliance 505 may directly query the master controller 501 for newcontent objects that match their local profiles, and receive from themaster controller 501 a list of the new objects that are available, aswell as their locations (e.g., content servers 502, peer clients 503, orpeer client appliances 505). These queries may occur periodically or inresponse to some external event (e.g., a request from the mastercontroller). Clients 503 or client appliances 505 can then select asuitable location to directly download the content from. In thisembodiment, master controller 501 need not maintain profiles for all theclients, and messages 310 and 320 would be unnecessary.

In one embodiment, master controller 501 knows four things: 1) thecontent clients want based on profiles received from clients; 2) the newcontent that is available; 3) the location of the new content (e.g.,servers, carrier edge caches, peers, etc.); and 4) network informationsuch as, for example, network connectivity (e.g., network topologyinformation, bandwidth, delay, and dynamically changing snapshots ofnetwork congestion and/or utilization). Using this information, mastercontroller 501 schedules downloads of new content objects to clients 503and client appliances 505. Such downloads may take the form of commandssuch as, for example, “get object from server 1” or may take the form ofinstructions such as, for example, “instruct client 2 to obtain theobject from client 1”. The network information and information aboutwhich downloads are taking place allow master controller 501 to doprovisioning taking into account resource availability.

In one embodiment, master controller 501 is able to coordinate downloadsso that prior to a download of content completing to a particularclient, another download of that content may start occurring form thatparticular client. This kind of pipelining of downloads cansignificantly reduce the delay before a content object is replicated toa potentially very large number of clients and/or client appliances.

Clients 503 download content objects from the locations specified inmessages 410 or 510 from the master controller. For example, in oneembodiment, client 503 may download bandwidth intensive content such as,for example, movies, video, software, images, sound files, etc. Client503 stores the content locally in one or more precache memories. Theprecache memory may be part of a client 503 (or is at least accessibleby it over a fast link, for example, over a LAN). The content may bedownloaded on the end user's premises. In one embodiment, thedownloading occurs without excessive interference with any otherinteractive network traffic.

A user request may be generated (e.g., from a web browser) to downloadspecific content from the network. Client software running on an endsystem can observe these requests. Alternately, a client appliance canobserve such a request originating from an end system to which it isconnected (e.g., through a LAN). Clients or client appliances monitoringthese requests can detect when the request is for a content object thatis in the local precache memory of the client or client appliance. Ifsuch a request is detected, clients or client appliances can interceptthe request and satisfy the request by returning the stored (precached)content object from its local precache memory. If the request is for acontent object that is not in the precache memory clients or clientappliances can forward the request on to their original destination(e.g., content server, carrier-edge cache, etc.). Thus, requests for aspecific type of bandwidth intensive content are intercepted. In oneembodiment, clients and client appliances are configurable to interceptrequests for a certain type of content object.

Thus, with content cached locally, clients and client appliances detectrequests for embedded objects, check their precache memory to determineif the embedded objects are stored locally, and return the object fromthe precache memory if available. If the content is not available, therequest is sent out into the network (e.g., Internet) to an appropriatelocation where the content or an alternate representation of the contentmay be found.

An Exemplary Protocol

FIG. 6 illustrates one embodiment of a protocol for exchanginginformation between master controller 501 and a client, a server, and apeer. Referring to FIG. 6, initially, when the client first boots up,the client registers (601). Registration by the client involves sendinginformation to enable master controller 501 to coordinate the precachingactivity.

In one embodiment, once registration has been completed, all but one ofthe remaining operations are controlled from master controller 501(e.g., in response to a NOC request or message). Thus, master controller501 sends a request to which the client replies, with the exception ofone situation.

After registration, master controller 501 requests the profile from theclient (602). In one embodiment, master controller 501 indicates thesize of the profile it is willing to accept or is able to accommodate.Then the client sends the profile to master controller 501 (622). In oneembodiment, the profile is a list of links (e.g., 50 to 100 URLs) inorder of access frequency, with links that are accessed more often beingat the top of the profile. If the profile is larger than the maximumspecified by master controller 501, the profile may be made smaller bythe client by removing links that have been accessed less frequently(e.g., that are at the bottom of the list of links).

Similarly, master controller 501 communicates with the web servers(e.g., content providers). A server registers with master controller 501(603). In response to the registration, master controller 501 requestsstate information from the server (604). This request may be generatedby master controller 501 periodically while the server remainsregistered. In response to the request, the server sends stateinformation (605). The state information may include a listing of allcontent objects that are linked through the sites. The list may belimited to only those content objects that are rich media objects orbandwidth intensive objects in terms of downloading. Every time newcontent is added or removed, the server sends an add message (606) or aremove message (607) to master controller 501 to update the list (e.g.,in a database) master controller 501 maintains of the content objectslinked through the site.

In one embodiment, master controller 501 initiates one or moremaintenance tests on the client (621). These tests are well-known in theart. For example, master controller 501 can request traceroutes fromthis client to some other Internet address or a bandwidth test from theclient to a different Internet address. Master controller 501 uses thesetests to determine network connectivity and resource availability. Withthis information, master controller 501 is able to obtain informationabout the network topology, a network map, etc., as listed above. Notethat such information may be provided to master controller 501 directlywithout the need of testing to discover it. At this point, mastercontroller 501 has information about network topology, information aboutserver size state, and information about clients.

In one embodiment, master controller 501 may send a reset cache message(609) if a cache checksum doesn't match a previously defined orcalculated value.

Master controller 501 keeps track of where the content is. Specifically,master controller 501 keeps track of a particular content piece (e.g.,video clips) and the identity of the servers and/or clients on which itis located. Occasionally, master controller 501 determines that a clientis to download some object from a location and at this time, mastercontroller 501 sends an initiate download message (610) to the clientthat identifies an object and the object's location. In one embodiment,the initiate download message includes the name of the object (e.g.,universal resource identifier (URI) and its natural location (e.g., aURL corresponding to its location on the server of its origin, a URLcorresponding to some peer client, etc.)).

In response to the download message, the client initiates the downloadby sending a get data command to a peer (611). After the peer begins tosend the data (623), the client sends a message to master controller 501indicating that the download has started (612). The download may take awhile. Once the download has been completed, then the client sends amessage to master controller 501 indicating that the download has beencompleted (613). This allows master controller 501 to know whichdownloads are occurring at any time.

In case the peer is behind a firewall, then the client cannot connect tothe peer directly and download the data from behind the firewall. Inthat case, master controller 501 sends a message (615) directly to thepeer to indicate that the peer is to upload the new content to theclient. Master controller 501 also sends a message to the client toexpect an upload (614) from some peer. A particular session key ornumber may be used to correlate uploaded information received by theclient from other peer clients with the correct download identified bymaster controller 501. The peer sends the upload (616). Finally, theclient sends a heartbeat message (617) to master controller 501 so thatmaster controller 501 knows that the client is up in and running

In one embodiment, the messages are small. Therefore, because almost allrequests come from master controller 501, master controller 501 is ableto schedule all the downloads to ensure that no single client or networklink is being overloaded.

Building User Profiles

The client creates a profile for an end user that is coupled to theclient. The profile may comprise a list of resource locators (e.g.,URLs), object type, object size, and a time stamp associated with theURLs to provide information as to when the end user accessed theresource. In one embodiment, the profile comprises URLs and accesstimes, identifying web sites or portions of web sites, and when the usertends to access them.

The client may build the user profile in a number of different ways. Inone embodiment, a user profile may be developed based on the user'sbrowsing patterns. The client tracks user's access patterns that mayinclude tracking the web sites a user visits, the time a user accessesthose sites, and/or the frequency of access may be identified and thenused to define the user's browsing patterns. In one embodiment,combining this information with information about the average size ofcertain types of objects and the availability of bandwidth to any givensite allows a determination to be made as to when to begin checking asite for new or updated content to ensure such content is availablelocally at the time it is likely to be accessed. If bandwidth isavailable (e.g., during the night), then the system (e.g., the mastercontroller, client, etc.) can check for updates more frequently.

Profiles may be configured by, or built, using input from a networkadministrator or manager, such as master controller 501 in the NOC. Forexample, the master controller 501 could add or remove URLs and accesstimes. To make a change to the profile, the client would be accessiblevia the network, such as by, for example, an Internet service provider(ISP), application service provider (ASP), or content provider, and theprofile could be configured through that access. An example of its usemight be where the ISP provides a service by which a video movie isprovided once a day to an end user. The individual could choose to watchthe movie or not because the movie would have been already downloaded.Profiles may also be configured by a content server or a contentprovider.

Alternatively, the profile may be manually set by an individual, suchas, for example, the user. The user may provide the specific URLs andaccess times manually to the profile. For example, if a user checks aset of web sites at a predetermined time during the day, the user canconfigure the network access gateway to access the web sites prior tothat time each day to obtain updated or new content. Such additions tothe profile augment the accuracy of the precaching.

A profile may be developed for a user using a combination of two or moreof these profile building methods. Further priorities can be assigned toURLs stored in the precache memory in case of conflicting access times.In one embodiment, user configured URLs have priority over learned URLs(developed from tracking user access patterns) and network administratorconfigured URLs (e.g., from master controller 501).

Furthermore, priorities can be given to URLs in case of conflictingaccess times. For example, in one embodiment, user configured URLs canhave priority over “learned” URLs generated from tracking user accesspatterns and externally configured URLs.

In one embodiment, only one precaching client is running on a system atany one time. An open application program interface (API) to the profilemay be provided to allow third parties to add URLs to user profiles, toschedule downloads, and to use the services provided by the precachingarchitecture for their applications.

Locating New Content

In one embodiment, clients may check for new content by subscribing withmaster controller 501 in the NOC. Clients 503 can subscribe with mastercontroller 501 to get automatic notification when new content becomesavailable. This is advantageous on large web sites with millions ofclients because it reduces, or even minimizes, time and resources usedin crawling.

Using the information stored in the user profiles, master controller 501periodically checks for new content. To facilitate this, client 503 mayhave previously passed updates to its profile, such as shown as arrow310 in FIG. 3. Master controller 510 maintains a list of web sites andtheir embedded media objects. This list is compiled by using updatedinformation from content providers 502, such as, for example, shown asarrows 210 and 230 in FIG. 2, or by crawling web sites from the NOC,such as shown as arrow 240 in FIG. 2. The crawling process is similar tothe way in which some Internet search engines create indices of webpages.

In one embodiment, content providers 502 support the system byperiodically crawling locally all available web pages on their serversto look for new object content. This local crawl can be implemented bysoftware, hardware or a combination of both.

The content providers 502 provide a summary of changes to mastercontroller 501. Alternatively, such information may be provided directlyto a client. The summary information may comprise the link, time, typeand size of each object. The summary may include a list of URLs forthose objects. The master controller compares the content in the listwith the profile information (e.g., the list maintained by the networkaccess gateway) to determine what content has changed and therefore whatcontent, if any, is to be downloaded. In one embodiment, the result ofthe local crawl is made available in a special file, namely an updateindex. Master controller 501 analyzes the update index to find the newdownload candidates. In one embodiment, content providers 502 manuallybuild an update index.

Master controller 501 collects and aggregates the summaries. In oneembodiment, each content provider 502 sends the summary to mastercontroller 501. In such a case, all the clients need only contact oneserver to download summary information for groups of participatingcontent servers in-the network. In one embodiment, master controller 501may unicast or multicast the information to one or more clients 503.

In an embodiment in which clients maintain their own profile, such asdescribed in U.S. application Ser. No. 09/566,068, entitled “IntelligentContent Precaching,” filed May 5, 2000, and issued as U.S. Pat. No.6917960 assigned to the corporate assignee of the present invention andincorporated herein by reference, clients 503 directly crawl a web siteand search for new content objects. Clients 503 perform a crawloperation by periodically checking web servers indicated in the profilefor new or updated content objects that it believes end users or otherlocal devices will be accessing in the near future. In one embodiment,in such a case, a client begins with a URL stored in the profile andfollows links into web pages down to a configurable level.

In one embodiment, the controller obtains the first page from the serverand determines if any bandwidth intensive objects are present. In oneembodiment, a bandwidth intensive object may be identified by its size.If bandwidth intensive, embedded objects exist, the controllerdetermines if new versions are available and 10 downloads them. When newcontent objects have been identified, the controller indicates to theclient to download only the bandwidth intensive (e.g., large), newcontent objects (as they become available). The content objects obtainedas a result of crawling are stored locally. In one embodiment, theprecache memory storing such objects also stores their URLs, data type,size, the time when the data was acquired and 15 the actual data itself.This process is transparent to the network and no change is needed tothe content servers or the network to operate the precaching client.

In an alternative embodiment, each new and/or updated content object isdownloaded independent of size (after determining if the content objectis a new version).

Some or all of these crawling techniques may be used in the samenetworked environment. For example, client 503 may crawl one or moresites to determine if any of the content objects have changed, whilereceiving information from master controller 501 or web serversemploying a mechanism to crawl their sites to identify updated or newcontent and while caches in the network or content servers provideupdated and new content to the client 503.

Downloading

Master controller 501 in the NOC maintains a database of availableobjects and their physical location. When a new object is available fordownloading to client 503, master controller 501 determines the mostsuitable location from which client 503 may download the object. In oneembodiment, master controller 501 does this by analyzing the databaseand the client's Internet protocol (IP) address, and relating this tonetwork topology, map, and connectivity information known to it. Ascheduler in the NOC returns a download trigger to client 503. Thetrigger provides information to enable client 503 to download theobject. This trigger information, or pointer, may comprise a locationand an object name (e.g., URL).

A requested object can be downloaded from a variety of sources, e.g. apeer, a carrier edge cache, or the original server. In FIG. 5, arrow 530represents a download from a peer. Management controller 501 determinesthe most suitable host based on parameters. In one embodiment, theseparameters include peer-to-peer hop count and bandwidth.

If no suitable peer is available (e.g., if the request is the firstrequest for an object or if suitable peers are too far away), the objectcan also be downloaded from a server installed on the carrier edge ifthe content provider supports carrier edge caching. If there is nosuitable peer and no cache can be found, the object is downloaded fromthe original content provider server 502. Client 503 downloads theobject in the background without excessively interfering withinteractive traffic, regardless of the location from which it downloads.

Intercept

In one embodiment, client 503 transparently analyzes the web pagesdownloaded by the end users and rewrites embedded URLs in web pages topoint to the locally precached object instead of the original object. Inrewriting URLs, specific marks (e.g. a different link color or anadditional icon) for objects available in the precache can be added.When the user finally selects (e.g., clicks) on a link, the browserautomatically loads the object from the precache instead of the contentprovider server.

Client 503 may intercept requests for content objects in different ways.For example, in one embodiment, client 503 monitors requests and whenthere is a request for a content object stored by (or for access by)client 503, it takes the request and responds to the request as if itwere originally addressed for client 503. Thus, an end user generatingthe request receives the content object as if it had received thecontent from the original server hosting the content. This is oneexample of an implicit way to translate an access for a content objectto a locally cached object. The interception-of requests may be doneexplicitly where an end system is aware of the new location of theobject (e.g., through DNS lookup). In one embodiment, client 503 checksfor certain types of requests, which correspond to content available inthe precache memory (e.g., all *.mov files). If such a request isdetected, the client searches the precache memory for the requested URL.

Applications

The peer-to-peer precaching technique facilitates provision of premiumservices, such as explicit downloads, mapped content, aggregated accessstatistics, etc. The premium service of explicit downloads is done byinstalling triggers to pull the customer's content (e.g., all new clipson a web site immediately go to all sites with the web site's URL intheir profiles).

Mapped content allows customers to offer dense content dedicated toprecache-enabled users. In one embodiment, this is implemented byoffering a separate high resolution file of a video clip which is notlinked into any web page, but is available to the master controller whenit checks a target web site for new content. When the user clicks on avideo icon on a web page, the transparent precache technology deliversthe high resolution version instead of the potentially low resolutionversion.

In aggregated access statistics, access statistics and user profilestatistics are provided to content providers and distributors. Forexample, individual user access profiles on the customer premises areretained, with the statistics being reported. By only reporting theaggregated statistics, privacy concerns are avoided.

Besides enhancing traditional web sites with high-quality video, theprecaching technique can be applied in other areas, such as advertising,and DVD on demand. Running decent quality video advertising over theInternet has not been possible so far. A broadband connection can barelydeliver a single low-quality video stream, and consumers would certainlynot want video ads to eat up their interactive bandwidth. Thus,advertisers are currently limited to using “banner ads,” which aremostly implemented as blinking images (animated GIFs). With precaching,advertisements can be downloaded while the link is not used otherwise.Thus, full motion ads can be downloaded in the background, and embeddedin web pages, without exhausting the interactive bandwidth. Thepeer-to-peer video precaching technique helps advertisers to succeed intheir hunt for eye balls. In addition, the precaching technique allowsthe advertisers and content providers to retain their ability to keeptrack of the number of “hits” of the embedded ads.

The precaching technique also makes online distribution of DVD videofeasible. The hassle with late fees, midnight video store runs andrewinding charges would be avoided using an online renting model. MPEG2video, the coding standard used on DVDs, requires an average bandwidthof 3.7 Mbits/sec. The average length of a DVD movie is two hours. Anaverage movie needs approximately 3.5 Gbytes of disk space. Over a 500kbits/sec Internet connection, three hours of DVD-quality movie can bedownloaded in 24 hours. If the connection is twice as fast (e.g., 1Mbit/sec), three full DVD movies can be delivered over the Internet in aday.

Thus, a technique of personalized content delivery using peer-to-peerprecaching has been described. In particular, this technique savescontent providers bandwidth on their server farms and carrier edgecaches. It also improves the interactive experience of a large number ofweb sites. While the previous discussion focuses on clients running onend system PCs, the technique can be implemented to run in accessgateways, home gateways, set-top boxes, etc.

An Exemplary Computer System

FIG. 7 is a block diagram of an exemplary computer system (e.g., PC,workstation, etc.). Referring to FIG. 7, computer system 700 maycomprise an exemplary client 503 or server 502 computer system. Computersystem 700 comprises a communication mechanism or bus 711 forcommunicating information, and a processor 712 coupled with bus 711 forprocessing information. Processor 712 includes a microprocessor, but isnot limited to a microprocessor, such as, for example, Pentium™,PowerPC™, Alpha™, etc.

System 700 further comprises a random access memory (RAM), or otherdynamic storage device 704 (referred to as main memory) coupled to bus711 for storing information and instructions to be executed by processor712. Main memory 704 also may be used for storing temporary variables orother intermediate information during execution of instructions byprocessor 712.

Computer system 700 also comprises a read only memory (ROM) and/or otherstatic storage device 706 coupled to bus 711 for storing staticinformation and instructions for processor 712, and a data storagedevice 707, such as a magnetic disk or optical disk and itscorresponding disk drive. Data storage device 707 is coupled to bus 711for storing information and instructions.

Computer system 700 may further be coupled to a display device 721, suchas a cathode ray tube (CRT) or liquid crystal display (LCD), coupled tobus 711 for displaying information to a computer user. An alphanumericinput device 722, including alphanumeric and other keys, may also becoupled to bus 711 for communicating information and command selectionsto processor 712. An additional user input device is cursor control 723,such as a mouse, trackball, trackpad, stylus, or cursor direction keys,coupled to bus 711 for communicating direction information and commandselections to processor 712, and for controlling cursor movement ondisplay 721.

Another device that may be coupled to bus 711 is hard copy device 724,which may be used for printing instructions, data, or other informationon a medium such as paper, film, or similar types of media. Furthermore,a sound recording and playback device, such as a speaker and/ormicrophone may optionally be coupled to bus 711 for audio interfacingwith computer system 700. Another device that may be coupled to bus 711is a wired/wireless communication capability 725 to communication to aphone or handheld palm device.

Note that any or all of the components of system 700 and associatedhardware may be used in the present invention. However, it can beappreciated that other configurations of the computer system may includesome or all of the devices.

Whereas many alterations and modifications of the present invention willno doubt become apparent to a person of ordinary skill in the art afterhaving read the foregoing description, it is to be understood that anyparticular embodiment shown and described by way of illustration is inno way intended to be considered limiting. Therefore, references todetails of various embodiments are not intended to limit the scope ofthe claims which in themselves recite only those features regarded asessential to the invention.

In the foregoing specification, the invention has been described withreference to specific exemplary embodiments thereof. It will, however,be evident that various modifications and changes may be made theretowithout departing from the broader spirit and scope of the invention asset forth in the appended claims. The specification and drawings are,accordingly, to be regarded in an illustrative rather than a restrictivesense.

We claim:
 1. A method comprising: (a) identifying, by a controllerintermediary to a plurality of client devices and one or more contentservers, a plurality of content stored on the one or more contentservers; (b) determining, by the controller, that a content of theplurality of content corresponds to a resource identified by a profileof each subscriber of a plurality of subscribers; and (c) transmitting,by the controller, a message to the plurality of client devicescorresponding to each subscriber to initiate a download of the contentfrom a location identified by the message
 2. The method of claim 1,wherein (a) further comprises crawling, by the controller, content of acontent server of the one or more content servers to identify thecontent.
 3. The method of claim 1, wherein (a) further comprisesreceiving, by the controller, a report of a content server of the one ormore content servers to identify the content.
 4. The method of claim 1,wherein (a) further comprises receiving, by the controller, a report oncontent of the one or more content servers from a server appliance thatis configured to crawl the one or more content servers.
 5. The method ofclaim 1, wherein the client device comprise an end node of a localnetwork and having a browser operated by the subscriber.
 6. The methodof claim 1, wherein (b) further comprises receiving, by the controller,the profile of each subscriber from a client device of each subscriber.7. The method of claim 1, wherein (b) further comprises receiving, bythe controller, the profile of each subscriber from a client appliancein communication with the plurality of client devices.
 8. The method ofclaim 1, wherein (b) further comprises querying a database of profilesto determine the plurality of subscribers having profiles correspondingto the content.
 9. The method of claim 1, wherein (c) further comprisingtransmitting, by the controller, the message in accordance with adownload schedule determined by the controller based on networkinformation.
 10. The method of claim 9, wherein the network informationcomprises one or more of the following: network topology, bandwidth,delay, congestion and utilization.
 11. A system comprising: a controllerdevice intermediary to a plurality of client devices and one or morecontent servers, the controller device configured to: identify aplurality of content stored on the one or more content servers;determine that a content of the plurality of content corresponds to aresource identified by a profile of each subscriber of a plurality ofsubscribers; and transmit a message to the plurality of client devicescorresponding to each subscriber to initiate a download of the contentfrom a location identified by the message.
 12. The system of claim 11,wherein the controller device is further configured to crawl content ofa content server of the one or more content servers to identify thecontent.
 13. The system of claim 11, wherein the controller device isfurther configured to receive a report of a content server of the one ormore content servers to identify the content.
 14. The system of claim11, wherein the controller device is further configured to receive areport on content of the one or more content servers from a serverappliance that is configured to crawl the one or more content servers.15. The system of claim 11, wherein the client device comprise an endnode of a local network and having a browser operated by the subscriber.16. The system of claim 11, wherein the controller device is furtherconfigured to receive the profile of each subscriber from a clientdevice of each subscriber.
 17. The system of claim 11, wherein thecontroller device is further configured to receive the profile of eachsubscriber from a client appliance in communication with the pluralityof client devices.
 18. The system of claim 11, wherein the controllerdevice is further configured to query a database of profiles todetermine the plurality of subscribers having profiles corresponding tothe content.
 19. The system of claim 11, wherein the controller deviceis further configured to transmit the message in accordance with adownload schedule determined by the controller based on networkinformation.
 20. The system of claim 19, wherein the network informationcomprises one or more of the following: network topology, bandwidth,delay, congestion and utilization.